Search CORE

117 research outputs found

A new pairwise kernel for biological network inference with support vector machines

Author: A Ben-Hur
A Ramani
B Schölkopf
C Harbison
C von Mering
E Sprinzak
E Xing
EM Marcotte
F Pazos
GD Bader
GRG Lanckriet
GS Kimeldorf
HW Mewes
IW Tsang
Jean-Philippe Vert
Jian Qiu
JP Vert
KQ Weinberger
N Aronszajn
N Friedman
P Pavlidis
R Jansen
RI Kondor
S Boyd
S Martin
SF Altschul
SM Gomez
VN Vapnik
William S Noble
WK Huh
Y Qi
Y Yamanishi
Y Yamanishi
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

International audienceBACKGROUND: Much recent work in bioinformatics has focused on the inference of various types of biological networks, representing gene regulation, metabolic processes, protein-protein interactions, etc. A common setting involves inferring network edges in a supervised fashion from a set of high-confidence edges, possibly characterized by multiple, heterogeneous data sets (protein sequence, gene expression, etc.). RESULTS: Here, we distinguish between two modes of inference in this setting: direct inference based upon similarities between nodes joined by an edge, and indirect inference based upon similarities between one pair of nodes and another pair of nodes. We propose a supervised approach for the direct case by translating it into a distance metric learning problem. A relaxation of the resulting convex optimization problem leads to the support vector machine (SVM) algorithm with a particular kernel for pairs, which we call the metric learning pairwise kernel. This new kernel for pairs can easily be used by most SVM implementations to solve problems of supervised classification and inference of pairwise relationships from heterogeneous data. We demonstrate, using several real biological networks and genomic datasets, that this approach often improves upon the state-of-the-art SVM for indirect inference with another pairwise kernel, and that the combination of both kernels always improves upon each individual kernel. CONCLUSION: The metric learning pairwise kernel is a new formulation to infer pairwise relationships with SVM, which provides state-of-the-art results for the inference of several biological networks from heterogeneous genomic data

Crossref

Springer - Publisher Connector

PubMed Central

HAL Descartes

HAL-MINES ParisTech

Modeling recursive RNA interference.

Author: A Dillin
A Grishok
A Reynolds
AA Andronow
AS Peek
C Gerner
CC Mello
CT Bergstrom
D Schmitter
DC Baulcombe
DW Bartlett
E Bernstein
E Koller
E Yang
FJ Isaacs
GL Tang
J Pak
J Yu
J Zhang
JC Arciero
JJ MacRae
JK Kim
John S. Mattick
JP Vert
K Ui-Tei
M Kaern
M Wiznerowicz
MAC Groenenbom
MC Saleh
MT McManus
MW Rhoades
N Bushati
N Doi
NR Dudley
NT Hoa
P Jia
Q Liu
S Dorner
SM Elbashir
SM Hammond
T Holen
T Sijen
T Tuschl
TA Vickers
TL Deans
U Alon
Wallace F. Marshall
X Darzacq
X Huang
XD Zhang
XS Ke
Y Tomari
Z Xie
Publication venue: eScholarship, University of California
Publication date: 01/01/2008
Field of study

An important application of the RNA interference (RNAi) pathway is its use as a small RNA-based regulatory system commonly exploited to suppress expression of target genes to test their function in vivo. In several published experiments, RNAi has been used to inactivate components of the RNAi pathway itself, a procedure termed recursive RNAi in this report. The theoretical basis of recursive RNAi is unclear since the procedure could potentially be self-defeating, and in practice the effectiveness of recursive RNAi in published experiments is highly variable. A mathematical model for recursive RNAi was developed and used to investigate the range of conditions under which the procedure should be effective. The model predicts that the effectiveness of recursive RNAi is strongly dependent on the efficacy of RNAi at knocking down target gene expression. This efficacy is known to vary highly between different cell types, and comparison of the model predictions to published experimental data suggests that variation in RNAi efficacy may be the main cause of discrepancies between published recursive RNAi experiments in different organisms. The model suggests potential ways to optimize the effectiveness of recursive RNAi both for screening of RNAi components as well as for improved temporal control of gene expression in switch off-switch on experiments

Public Library of Science (PLOS)

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Classification of microarray data using gene networks

Author: A Sivachenko
Andrei Zinovyev
B Mohar
B Schölkopf
B Schölkopf
BE Boser
D Cavalieri
D Hanisch
D Hosack
Emmanuel Barillot
Franck Rapaport
FRK Chung
G Mercier
G Mercier
I Gat-Viks
I Jolliffe
J Rahnenfuhrer
J van Helden
JC Liao
Jean-Philippe Vert
JM Stuart
JP Vert
KR Curtis
Marie Dutreix
O Babur
O Radulescu
P Kharchenko
P Kharchenko
P Shannon
PD Karp
R Kelley
R Thomas
SJ Galbraith
T Breslin
T Hastie
TGO Consortium
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

BACKGROUND: Microarrays have become extremely useful for analysing genetic phenomena, but establishing a relation between microarray analysis results (typically a list of genes) and their biological significance is often difficult. Currently, the standard approach is to map a posteriori the results onto gene networks in order to elucidate the functions perturbed at the level of pathways. However, integrating a priori knowledge of the gene networks could help in the statistical analysis of gene expression data and in their biological interpretation. RESULTS: We propose a method to integrate a priori the knowledge of a gene network in the analysis of gene expression data. The approach is based on the spectral decomposition of gene expression profiles with respect to the eigenfunctions of the graph, resulting in an attenuation of the high-frequency components of the expression profiles with respect to the topology of the graph. We show how to derive unsupervised and supervised classification algorithms of expression profiles, resulting in classifiers with biological relevance. We illustrate the method with the analysis of a set of expression profiles from irradiated and non-irradiated yeast strains. CONCLUSION: Including a priori knowledge of a gene network for the analysis of gene expression data leads to good classification performance and improved interpretability of the results

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

HAL-MINES ParisTech

Support Vector Machines and Kernels for Computational Biology

ISSN:1553-734XISSN:1553-735

Repository for Publications and Research Data

Crossref

Fraunhofer-ePrints

Directory of Open Access Journals

PubMed Central

MPG.PuRe

In silico prioritisation of candidate genes for prokaryotic gene function discovery: an application of phylogenetic profiles

Author: C Médigue
C Perez-Iratxeta
CM Fraser
DM Raskin
EA Adie
EA Adie
EC Lin
EM Marcotte
Enrico Coiera
Frank PY Lin
FS Turner
G Michal
IH Witten
J Freudenberg
J Wu
JP Gogarten
JP Vert
KJ Gaulton
M Kanehisa
M Pellegrini
MY Galperin
N López-Bigas
N Tiffin
PD Karp
R Jothi
Ruiting Lan
S Aerts
Vitali Sintchenko
WJ Kent
Y Yamanishi
Y Zheng
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background: In silico candidate gene prioritisation (CGP) aids the discovery of gene functions by ranking genes according to an objective relevance score. While several CGP methods have been described for identifying human disease genes, corresponding methods for prokaryotic gene function discovery are lacking. Here we present two prokaryotic CGP methods, based on phylogenetic profiles, to assist with this task. Results: Using gene occurrence patterns in sample genomes, we developed two CGP methods (statistical and inductive CGP) to assist with the discovery of bacterial gene functions. Statistical CGP exploits the differences in gene frequency against phenotypic groups, while inductive CGP applies supervised machine learning to identify gene occurrence pattern across genomes. Three rediscovery experiments were designed to evaluate the CGP frameworks. The first experiment attempted to rediscover peptidoglycan genes with 417 published genome sequences. Both CGP methods achieved best areas under receiver operating characteristic curve (AUC) of 0.911 in Escherichia coli K-12 (EC-K12) and 0.978 Streptococcus agalactiae 2603 (SA-2603) genomes, with an average improvement in precision of >3.2-fold and a maximum of >27-fold using statistical CGP. A median AUC of >0.95 could still be achieved with as few as 10 genome examples in each group of genome examples in the rediscovery of the peptidoglycan metabolism genes. In the second experiment, a maximum of 109-fold improvement in precision was achieved in the rediscovery of anaerobic fermentation genes in EC-K12. The last experiment attempted to rediscover genes from 31 metabolic pathways in SA-2603, where 14 pathways achieved AUC >0.9 and 28 pathways achieved AUC >0.8 with the best inductive CGP algorithms. Conclusion: Our results demonstrate that the two CGP methods can assist with the study of functionally uncategorised genomic regions and discovery of bacterial gene-function relationships. Our rediscovery experiments also provide a set of standard tasks against which future methods may be compared.12 page(s

Crossref

PubMed Central

UNSWorks

Macquarie University ResearchOnline

Multi-Target Prediction: A Unifying View on Problems and Methods

Multi-target prediction (MTP) is concerned with the simultaneous prediction of multiple target variables of diverse type. Due to its enormous application potential, it has developed into an active and rapidly expanding research field that combines several subfields of machine learning, including multivariate regression, multi-label classification, multi-task learning, dyadic prediction, zero-shot learning, network inference, and matrix completion. In this paper, we present a unifying view on MTP problems and methods. First, we formally discuss commonalities and differences between existing MTP problems. To this end, we introduce a general framework that covers the above subfields as special cases. As a second contribution, we provide a structured overview of MTP methods. This is accomplished by identifying a number of key properties, which distinguish such methods and determine their suitability for different types of problems. Finally, we also discuss a few challenges for future research

arXiv.org e-Print Archive

Crossref

Ghent University Academic Bibliography

Application of kernel functions for accurate similarity search in large chemical databases

Author: A Smalter
Aaron Smalter
C Austin
C Dobson
D Shasha
D Williams
Gerald H Lushington
H Cheng
H He
J Cheng
JP Vert
Jun Huan
L Jacob
MM Cone
N Tolliday
PJ Ballester
R Giugno
R Jorissen
Raymond ea J
T Girke
T Liu
TS Rush
X Yan
X Yan
XH Wang
Xiaohong Wang
Y Cao
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Background Similaritysearch in chemical structure databases is an important problem with many applications in chemical genomics, drug design, and efficient chemical probe screening among others. It is widely believed that structure based methods provide an efficient way to do the query. Recently various graph kernel functions have been designed to capture the intrinsic similarity of graphs. Though successful in constructing accurate predictive and classification models, graph kernel functions can not be applied to large chemical compound database due to the high computational complexity and the difficulties in indexing similarity search for large databases. Results To bridge graph kernel function and similarity search in chemical databases, we applied a novel kernel-based similarity measurement, developed in our team, to measure similarity of graph represented chemicals. In our method, we utilize a hash table to support new graph kernel function definition, efficient storage and fast search. We have applied our method, named G-hash, to large chemical databases. Our results show that the G-hash method achieves state-of-the-art performance for k-nearest neighbor (k-NN) classification. Moreover, the similarity measurement and the index structure is scalable to large chemical databases with smaller indexing size, and faster query processing time as compared to state-of-the-art indexing methods such as Daylight fingerprints, C-tree and GraphGrep. Conclusions Efficient similarity query processing method for large chemical databases is challenging since we need to balance running time efficiency and similarity search accuracy. Our previous similarity search method, G-hash, provides a new way to perform similarity search in chemical databases. Experimental study validates the utility of G-hash in chemical databases

Crossref

Springer - Publisher Connector

KU ScholarWorks

PubMed Central

Inferring biological networks with output kernel trees

Author: A Ben-Hur
C von Mering
C von Mering
C Wade
Florence d'Alché-Buc
G Mercier
JDJ Han
JP Vert
L Breiman
M Eisen
M Kaneshiha
Marie Dutreix
Nizar Touleimat
P Geurts
P Geurts
P Shannon
P Spellman
P Uetz
Pierre Geurts
R Kondor
S Maere
T Ito
T Kato
W Huh
Y Yamanishi
Y Yamanishi
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

International audienc

HAL Evry

Crossref

Springer - Publisher Connector

PubMed Central

Open Repository and Bibliography - Liège

HAL-CEA

The impact of cyclin-dependent kinase 5 depletion on poly(ADP-ribose) polymerase activity and responses to radiation

Author: AK Fu
B Tian
C Godon
Celeste Bolin
CM Annunziata
D Quenet
Denis Biard
DS Biard
DT Beranek
E Despras
E Huang
E Mullaart
Fabrice P. Cordelières
Frédérique Mégnin-Chanet
G Noel
H Farmer
HE Bryant
J Filipski
J Nakamura
J Villen
J Zhang
Janet Hall
JC Ame
JP Gagne
JP Gagne
JP Vert
JW Walker
L Lan
Laurence Vaslin
M Rouleau
Marie Fernet
Mohammed-Tayyib Boudra
NC Turner
O Mortusewicz
O Mortusewicz
O Ullrich
PC Dedon
PI Bauer
S Aoufouchi
S Cicero
S Okano
SA Beausoleil
TD Penning
TM Kauppinen
Tomasz Zaremba
V Lalioti
V Schreiber
Vincent Favaudon
Vincent Pennaneach
Y Pommier
Y Tanaka
Y Yang
Y Yang
Publication venue: SP Birkhäuser Verlag Basel
Publication date: 01/01/2011
Field of study

Cyclin-dependent kinase 5 (Cdk5) has been identified as a determinant of sensitivity to poly(ADP-ribose) polymerase (PARP) inhibitors. Here, the consequences of its depletion on cell survival, PARP activity, the recruitment of base excision repair (BER) proteins to DNA damage sites, and overall DNA single-strand break (SSB) repair were investigated using isogenic HeLa stably depleted (KD) and Control cell lines. Synthetic lethality achieved by disrupting PARP activity in Cdk5-deficient cells was confirmed, and the Cdk5KD cells were also found to be sensitive to the killing effects of ionizing radiation (IR) but not methyl methanesulfonate or neocarzinostatin. The recruitment profiles of GFP-PARP-1 and XRCC1-YFP to sites of micro-irradiated Cdk5KD cells were slower and reached lower maximum values, while the profile of GFP-PCNA recruitment was faster and attained higher maximum values compared to Control cells. Higher basal, IR, and hydrogen peroxide-induced polymer levels were observed in Cdk5KD compared to Control cells. Recruitment of GFP-PARP-1 in which serines 782, 785, and 786, potential Cdk5 phosphorylation targets, were mutated to alanines in micro-irradiated Control cells was also reduced. We hypothesize that Cdk5-dependent PARP-1 phosphorylation on one or more of these serines results in an attenuation of its ribosylating activity facilitating persistence at DNA damage sites. Despite these deficiencies, Cdk5KD cells are able to effectively repair SSBs probably via the long patch BER pathway, suggesting that the enhanced radiation sensitivity of Cdk5KD cells is due to a role of Cdk5 in other pathways or the altered polymer levels

Crossref

Springer - Publisher Connector

PubMed Central

Statistical learning of peptide retention behavior in chromatographic separations: a new kernel-based approach for computational proteomics

Author: A Frank
A Frank
A Zien
AA Klammer
Andreas Leinenbach
AV Gorshkov
B Schölkopf
C Igel
C Leslie
C Oh
C Schley
CC Chang
Christian G Huber
CJC Burges
CT Mant
DN Perkins
EF Strittmatter
G Rätsch
G Rätsch
H Toll
JA Taylor
JK Eng
JL Meek
JP Dworzanski
JP Vert
K Petritis
K Petritis
LY Geer
M Sturm
MJ MacCoss
Nico Pfeifer
O Kohlbacher
O Krokhin
Oliver Kohlbacher
OV Krokhin
P Meinicke
R Craig
R Kaliszan
RE Moore
S Henikoff
S Sonnenburg
T Lingner
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background High-throughput peptide and protein identification technologies have benefited tremendously from strategies based on tandem mass spectrometry (MS/MS) in combination with database searching algorithms. A major problem with existing methods lies within the significant number of false positive and false negative annotations. So far, standard algorithms for protein identification do not use the information gained from separation processes usually involved in peptide analysis, such as retention time information, which are readily available from chromatographic separation of the sample. Identification can thus be improved by comparing measured retention times to predicted retention times. Current prediction models are derived from a set of measured test analytes but they usually require large amounts of training data. Results We introduce a new kernel function which can be applied in combination with support vector machines to a wide range of computational proteomics problems. We show the performance of this new approach by applying it to the prediction of peptide adsorption/elution behavior in strong anion-exchange solid-phase extraction (SAX-SPE) and ion-pair reversed-phase high-performance liquid chromatography (IP-RP-HPLC). Furthermore, the predicted retention times are used to improve spectrum identifications by a <it>p</it>-value-based filtering approach. The approach was tested on a number of different datasets and shows excellent performance while requiring only very small training sets (about 40 peptides instead of thousands). Using the retention time predictor in our retention time filter improves the fraction of correctly identified peptide mass spectra significantly. Conclusion The proposed kernel function is well-suited for the prediction of chromatographic separation in computational proteomics and requires only a limited amount of training data. The performance of this new method is demonstrated by applying it to peptide retention time prediction in IP-RP-HPLC and prediction of peptide sample fractionation in SAX-SPE. Finally, we incorporate the predicted chromatographic behavior in a <it>p</it>-value based filter to improve peptide identifications based on liquid chromatography-tandem mass spectrometry.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central